Comparison of the chloroplast genomes and phylogenomic analysis of Elaeocarpaceae

Background Elaeocarpaceae is a vital family in tropical and subtropical forests. Compared with the important position of Elaeocarpaceae species in forest ecosystem and the concern of medicinal value, the most research on Elaeocarpaceae are classification and taxonomy. Molecular systematics has corrected the morphological misjudgment, and it belongs to Oxalidales. Phylogenetic and divergence time estimates of Elaeocarpaceae is mostly constructed by using chloroplast gene fragments. At present, although there are reports on the chloroplast structure of Elaeocarpaceae, a comprehensive analysis of the chloroplast structure of Elaeocarpaceae is lacking. Methods To understand the variation in chloroplast sequence size and structure in Elaeocarpaceae, the chloroplast genomes of nine species were sequenced using the Illumina HiSeq 2500 platform and further assembled and annotated with Elaeocarpus japonicus and Sloanea sinensis (family Elaeocarpaceae) as references. A phylogenomic tree was constructed based on the complete chloroplast genomes of the 11 species representing five genera of Elaeocarpaceae. Chloroplast genome characteristics were examined by using Circoletto and IRscope software. Results The results revealed the following: (a) The 11 sequenced chloroplast genomes ranged in size from 157,546 to 159,400 bp. (b) The chloroplast genomes of Elaeocarpus, Sloanea, Crinodendron and Vallea lacked the rpl32 gene in the small single-copy (SSC) region. The large single-copy (LSC) region of the chloroplast genomes lacked the ndhK gene in Elaeocarpus, Vallea stipularis, and Aristotelia fruticosa. The LSC region of the chloroplast genomes lacked the infA gene in genus Elaeocarpus and Crinodendron patagua. (c) Through inverted repeat (IR) expansion and contraction analysis, a significant difference was found between the LSC/IRB and IRA/LSC boundaries among these species. Rps3 was detected in the neighboring regions of the LSC and IRb regions in Elaeocarpus. (d) Phylogenomic analysis revealed that the genus Elaeocarpus is closely related to Crinodendron patagua on an independent branch and Aristotelia fruticosa is closely related to Vallea stipularis, forming a clade with the genus Sloanea. Structural comparisons showed that Elaeocarpaceae diverged at 60 Mya, the genus Elaeocarpus diverged 53 Mya and that the genus Sloanea diverged 0.44 Mya. These results provide new insight into the evolution of the Elaeocarpaceae.

Although the taxonomy of Elaeocarpaceae belonged to Oxalidales is widely acknowledged, the relationships within Oxalidales need further study Soltis et al., 2000;Byng et al., 2016). Furthermore, the age of genera and relationships within Elaeocarpaceae are incongruent in previous studies based on multigene phylogenies (Crayn, Rossetto & Maynard, 2006;Heibl & Renner, 2012;Phoon, 2015), hence the age of the genera within Elaeocarpaceae have not been adequately examined.
Here we used the whole chloroplast genome sequences to further explore phylogenetic relationships within Elaeocarpaceae and other relative families in detail. This study aims to (a) test genetic category between different genus within Elaeocarpaceae, (b) determine the relationships of Elaeocarpaceae within Oxalidales, (c) use the molecular data together with the data from the palaeobotanical literature to infer divergence dates and the biogeographic history of the major clades within Oxalidales and Elaeocarpaceae.

Plant material and chloroplast genome sequencing
Leaf materials were sampled from nine species representing five genera of Elaeocarpaceae and collected from field in China and the Royal Botanic Gardens (Table 1). Voucher specimens of the collection were deposited at the Museum of Gannan Normal University, Nanling Herbarium (GNNU; Director: Yifei Xie, xie.yifei2018@gmail.com), Museum of Beijing Forestry University (BJFC) and The Royal Botanic Gardens (K). Total genomic DNA was extracted using the magnetic bead method and then sent to Sino Geno Max Company for next-generation sequencing using the Illumina HiSeq (TM) 2500 platform in Beijing, China and the read length generated from the Illumina platform was 150 bp. The raw data were filtered by cutadapt version 1.9.1 and trimmed by Trimmomatic version 0.39 to remove low-quality bases with the parameters in S1 (Martin, 2011;Bolger, Lohse & Usadel, 2014). Then we obtained clean data and uploaded them to the NCBI SRA database in fastq format (Table 2).

Phylogenomics and molecular clock dating analysis
To infer phylogenetic relationships within the Elaeocarpaceae and other related families, 20 species of six families including Elaeocarpaceae, Cephalotaceae, Brunelliaceae, Oxalidaceae and Connaraceae were compared. The genomes from the six families included 11 new chloroplast genomes and nine published complete chloroplast genomes ( and then ambiguously aligned fragments were removed using Gblocks version 0.91b with the following parameter settings: minimum number of sequences for a conserved/flank position (11/11), maximum number of contiguous non-conserved positions (eight), minimum length of a block (10), allowed half of gap positions (Talavera & Castresana, 2007). Bayesian inference (BI) was performed using MrBayes version 3.2.6 (Ronquist & Huelsenbeck, 2003). The best-fitting DNA substitution model according to the Bayesian information criterion (BIC), GTR (General Time Reversible) + F (Felsenstein) + I (proportion of Invariable sites), was identified by using jModelTest version 2.1. 10 (Darriba et al., 2012;Guindon, Gascuel & Rannala, 2003). Markov chain Monte Carlo simulations (MCMC) were run for 10,000,000 generations. The BI analysis started with a random tree and sampled trees every 1,000 generations. The first 25% of the trees were discarded as burn-in, and the remaining trees were used to generate a majority-rule consensus tree. Besides, we also estimated a maximum likelihood (ML) phylogeny for the genera in RAxML v8.0.0 (Stamatakis, Hoover & Rougemont, 2008), on the CIPRES web server (www.phylo.org). We used the default settings, including a TVM (Transversion model) + R3 (Free Rate three) + F (Felsenstein) model of sequence evolution. Ultrafast bootstrap with 1,000 replicates under iteration of 500 and correlation coefficient of 0.9 are used to infer the ML tree. Then, based on BEAST 1.10.4, a lognormal distribution with an uncorrelated relaxed clock model was run by using the GTR + F + I site model with four gamma categories, with a random starting tree and a Yule speciation process tree prior (Suchard et al., 2018). MCMC was performed with 500 million generations and sampling every 50,000 generations and the effective sample size (ESS) values were confirmed exceeded 200 for all parameters. Then we used the phyutility software to generate an all-compatible consensus tree with the first 25% of the trees as burn-in (Smith & Dunn, 2008). Node ages of the consensus phylogeny were estimated using the TreeAnnotator software (Drummond & Rambaut, 2007;Dexter & Chave, 2016). A total of 95% highest posterior density intervals (HPD) for each node are shown on the tree. Additionally, the phylogeny was calibrated using four fossils, one fossil from a related clade and by setting the split between Sloanea and Vallea to 55 ± 2 Mya (Mayr, 2000). We used the 40 ± 10 Mya split between Vallea and Aristotelia as the calibration point (Heibl & Renner, 2012). Elaeocarpus from the Tasmania in Australia that is about 55 ± 2 Mya old (Hill, 1984). The other fossils are leaves of Rourea (Connaraceae) from Panama, dated to 59 Mya (Graham, 1988). The fossils are used as the ages of the nodes of the tree and applied as calibration points with the normal prior distribution. The tree was viewed and edited by FigTree version 1.4.0 software (http://tree. bio.ed.ac.uk/software/figtree/).

Overall structure
The 11 sequenced Elaeocarpaceae chloroplast genomes showed a quadripartite structure, an LSC region, an SSC region, and a pair of IR regions, with lengths ranging from 157,546 bp (S. sinensis) to 159,400 bp (C. patagua). The length of the LSC region ranged from 85,874 bp (E. japonicus) to 88,413 bp (S. sinensis), that of the IR regions ranged from 25,984 bp (S. sinensis) to 27,437 bp (E. japonicus and E. japonicus var. yunnanensis), and that of the SSC region ranged from 16,981 bp (E. japonicus) to 17,958 bp (C. Patagua). The total GC content of the 11 chloroplast genomes from five representative genera was approximately 37%, while the GC contents of the IR, LSC and SSC regions were approximately 43%, 35% and 31%, respectively. In contrast to the chloroplast genome of A. fruticosa, which had 133 genes, including eight rRNA genes, 37 tRNA genes, and 88 protein-coding genes, the chloroplast genomes of the other four genera had 132 genes, including eight rRNA genes, 37 tRNA genes, and 87 protein-coding genes. A total of 114 unique genes were detected in the chloroplast genome of A. fruticosa, while C. patagua, V. stipularis and the genus Sloanea had 113 unique genes, and genus Elaeocarpus had 111 unique genes (Table 3).

Chloroplast genome comparisons
The five genera shared 111 protein-coding genes, but rpl32 was only detected in the SSC region of the chloroplast genome of A. fruticosa (Fig. 1). The LSC region of the chloroplast genomes lacked ndhK in the genus Elaeocarpus, V. stipularis, and A. fruticosa and lacked infA in the genus Elaeocarpus and C. patagua. ycf68 was found in V. stipularis, A. fruticosa and C. patagua. In addition, synteny was detected in the five genera of Elaeocarpaceae (Fig. 2). A significant degree of synteny was found between V. stipularis and A. fruticosa, Elaeocarpus and Sloanea. However, the synteny between C. patagua and the other four genera was low. Five genera of Elaeocarpaceae were compared, in addition to the species in Elaeocarpus and Sloanea. Two groups, E. angustifolius and E. hainanensis as well as E. japonicus and E. japonicus var. yunnanensis, had more blocks of synteny in the genus Elaeocarpus. Several blocks of synteny were detected in the four chloroplast genomes of the genus Sloanea, suggesting that the four species are similar to each other.

IR expansion and contraction
In the sequenced chloroplast genomes of Elaeocarpaceae, two complete or fragmented copies of rps19 and rpl2 were located at the boundaries between the LSC region and IRa or IRb region in V. stipularis, A. fruticosa, C. patagua and the genus Sloanea (Fig. 3). In contrast, rps3, rpl22 and rpl16 were detected in the neighboring regions of the LSC or IRa or IRb region in the genus Elaeocarpus. The distance between the fragment of ndhF and the boundary of the SSC and IRb regions in E. angustifolius was 370 bp, much greater than that in the chloroplast genomes of other species in the genus Elaeocarpus: E. japonicus var. yunnanensis, E. japonicus and E. angustifolius. Moreover, the lengths of ndhF and ycf1 in E. angustifolius were shorter than those in the other three species. For the genus Sloanea,  the chloroplast genomes of four species, S. sinensis, S. cordifolia, S. dasycarpa and S. longiaculeatae, were generally the same in terms of IR expansion and contraction, with the exception that the length of ycf1 in S. dasycarpa and S. longiaculeatae was greater than that in S. sinensis and S. cordifolia.

Phylogenomics and molecular clock dating analysis
The matrix of complete chloroplast genomes was used to reconstruct a phylogenomic tree of Oxalidales (Fig. 4) The molecular tree also showed the sister relationships of 11 chloroplast genomes from five representative genera of Elaeocarpaceae were highly supported. Clade I, containing the genus Elaeocarpus, was 100% supported and was dated to ca. 53 Mya (HPD:50-56 Mya), and the crown node age of clade II (C. patagua) was dated to ca. 55 Mya (HPD:52-58 Mya). Diversification of clade III, containing the Sloanea alliance (V. stipularis, A. fruticosa

Complete chloroplast structure of Elaeocarpaceae
This study included 11 complete chloroplast genomes for Elaeocarpaceae plants. All these complete chloroplast genomes had a total GC content of 37%, consistent with the low GC content in the chloroplast genomes of other angiosperms. The higher the content of GC is, the higher the density of DNA and the more conserved the chloroplast genome is Do, Kim & Kim (2013). Therefore, variation might occur in the SSC region rather than the IR regions. Comparisons of the 11 plastomes showed the loss of infA in C. patagua and the genus Elaeocarpus, and similar losses or pseudogenization was reported in the 309 complete chloroplast genomes of 24 species of angiosperms (Millen et al., 2001). ndh genes are frequently pseudogenized or lost in plant groups with a degree of heterotrophy due to evolutionary adaptation to excessive water in the environment, as observed in Aneura, Cuscuta, Epifagus, Hydnora, and nonphotosynthetic orchid species and some autotrophic gymnosperms and ferns (DePamphilis & Palmer, 1990;McNeal et al., 2007;Wickett et al., 2008;Wicke et al., 2011;Kim et al., 2015;Naumann et al., 2016). The rpl32 gene was detected in A. fruticosa but not in the other four genera (V. stipularis, C. patagua, the genus Elaeocarpus and the genus Sloanea), which is similar to previously published research about the losses of two genes, infA and rpl32, in Thalictrum coreanum (Park & Jansen, 2015). In summary, the five genera may have experienced different niche expansions. Previous studies have shown that IR boundary regions with large expansions and contractions may be related to double-strand breakage and repair, while small expansions and contractions may be related to gene conversions, which is a common phenomenon in the evolution of the chloroplast genomes (Kim & Lee, 2004;Khakhlova & Bock, 2006;Hansen et al., 2007;Wang et al., 2008;Ma et al., 2013;Liang, Wen & Gao, 2018). We found large IR expansions in the five genera. The genus Elaeocarpus is different from the other four genera at the IR/SC boundary, which may reflect that the genus Sloanea has an older origin and experienced a different evolution event. In addition, rps19 was located across the LSC/IRB regions in four genera, while the boundary of the LSC and IRb regions in the genus Elaeocarpus included rps3. Research shows that the locations of rps19 and rps3 differ between the chloroplasts of monocotyledons and dicotyledons. In some dicotyledons, rps19 only partially exists in the IR region, while the rps3 gene is only found in Paris and Melanthiaceae (Lin et al., 2012;Sarah, Kim & Kim, 2013). Compared with the other four genera, the genus Sloanea experienced different complex evolutionary events.
Homologous fragments have been found via collinearity analysis in various plants, including Capparaceae (Alzahrani et al., 2021), Ranunculaceae (Park & Park, 2021), and Passiflora (Cauz-Santos et al., 2020). The length of homologous fragments is related to the time of divergence between species. The shorter the time of species differentiation is, the more homologous fragments there are (Cheng et al., 2013). According to the similarity of the 11 chloroplast genomes of Elaeocarpaceae, we detected several blocks of synteny between V. stipularis and A. fruticosa, the genus Elaeocarpus and the genus Sloanea, meaning that the times of divergence between the genus Sloanea and the genus Elaeocarpus, V. stipularis and A. fruticosa was similar. Interestingly, there were no blocks of synteny in C. patagua with the other four genera, meaning that the evolution of C. patagua was different from the rest of the genera. In the genus Elaeocarpus and genus Sloanea, it is worth noting that the divergence time of E. japonicus was similar to that of E. japonicus var. yunnanensis and that of E. angustifolius was similar to that of E. hainanensis. In addition, the times of divergence among species in the genus Sloanea were similar.

Phylogenomic relationships and historical biogeography in Oxalidales
Based on the 20 species of six families with available complete chloroplast genomes, a phylogenomic tree of Oxalidales was reconstructed, consistent with the recent phylogeny (Byng et al., 2016;Baker et al., 2021;Li et al., 2021). Elaeocarpacea was clarified as sister to Cephalotaceae and Brunelliaceae and Connaraceae. In addition, Oxalidaceae is far from Elaeocarpaceae as previously recognized, which was recognized by Heibl & Renner (2012). Pillon et al. (2021) phylogeny of Oxalidales based on plastid genomics has been used as data for event-based biogeographic analysis of the world. In that study the likely ancestral area for the Oxalidales is Australia/New Guinea + New Caledonia in Cretaceous, which was consistent with our results. Indeed, the greatest number of extant species and genera are found in Oceania, and particularly in eastern Australia, New Guinea, and New Caledonia (Kershaw, 1976;Kershaw, Bretherton & van der Kaars, 2007;Sniderman, 2011).
The age of the Connaraceae clade with R. microphylla was much older than previously estimated (74 Mya;Heibl & Renner, 2012). The recent discovery of Connarus-like wood from the Paleocene of India, outside the modern range of the family, suggests a possible origin in India during the Cretaceous, when India was an island continent, and subsequently spread throughout the Old World tropics as India docked with Asia (Baas et al., 2017).
The differentiation time of Oxalidaceae is consistent with that of Heibl and Renner, which is about 68 Mya. Geographical distribution patterns suggest the origin of the family in the southern hemisphere, prior to the separation of South America and Africa (Raven & Axelrod, 1974).
The split from Cephalotaceae and Brunelliaceae was estimated at 60 Mya, more recent than Heibl and Renner's research (78 Mya). According to recent research, Brunellia is exclusively American, with only six of the 61 known species occurring north of Panama. Cephalotaceae grows only in the extreme SouthWest of Australia (Matthews & Endress, 2002). The presence of Brunelliaceae and Cephalotaceae may indicate that the genera may have been represented north of Panama before the closing of the central American land bridge (Coode, 2004).
As for the differentiation time of Elaeocarpaceae, it has long been postulated that the family Elaeocarpaceae originated in the southern hemisphere, of which only Elaeocarpus and Sloanea reach the northern hemisphere (Raven & Axelrod, 1974). The crown age of Elaeocarpaceae estimated in this study based on molecular data was younger than the age previously estimated at 79.62-85.2 Mya (Magallon et al., 2015;Phoon, 2015), 64-66 Mya (Wikström, Savolainen & Chase, 2001), 67 Mya (Heibl & Renner, 2012) and 100 Mya (Crayn, Rossetto & Maynard, 2006), older than 38 Mya (Harris & Davies, 2016). These differences may be due to the choice of DNA markers and the accuracy of the fossil calibrations of molecular evolutionary rates. The earliest divergence within the Elaeocarpaceae appears to have occurred in the late Cretaceous based on our data, which is broadly coincident with the time when the western (Africa and South America) and eastern (Australia, Antarctica, Madagascar and India) parts of Gondwana were separating (Ali & Aitchison, 2008).

Phylogenomic relationships and historical biogeography in Elaeocarpaceae
Within Elaeocarpaceae, the 11 taxa were separated into the following groups in our study: the Sloanea alliance (V. stipularis, A. fruticosa and Sloanea), Elaeocarpus alliance and C. patagua alliance. The phylogenomic placements are consistent with those in Phoon's research (Phoon, 2015). One major challenge in previous studies of the phylogenetic relationships between and within Elaeocarpaceae was the focus on DNA markers (trnL-trnF region and trnV-ndhC region) rather than complete chloroplast genomes (Maynard, 2004;Baba, 2013;Phoon, 2015). Furthermore, the DNA markers exhibited low sequence variability, leading to insufficiently resolved phylogenies within Elaeocarpus, and there no phylogenetic tree was constructed for Sloanea. Our phylogenomic analysis strongly confirmed the preliminary results of previous studies but with higher robustness, and the results of the analysis improved the posterior probabilities of all clades (Maynard, 2004;Baba, 2013;Phoon, 2015).
Compared with the differentiation of V. stipularis and A. fruticosa in Phoon's study, the age of the split between V. stipularis and A. fruticosa was much younger than previously estimated at 37 Mya (Phoon, 2015). The results of the present study agreed with Coode's phylogenetic reconstruction in which Vallea and A. were sisters and Phoon's finding that the ancestors may have dispersed between western and eastern Gondwana. The minimum estimates of divergence times between V. stipularis and A. fruticosa because the divergence of the South American and New Zealand lineages at 24-27 and 3 Mya respectively, postdates the isolation of their respective landmasses (McLoughlin, 2001).
Crinodendron was resolved in this study as an independent branch. The split from Elaeocarpaceae was estimated at 55 Mya, more recent than Phoon's estimate (59 Mya). The divergence of Crinodendron is estimated to have occurred during the Paleo-Eocene, but the origin of the genus is almost certainly older given the position of Dubouzetia brasiliense (from a dwarf cloud forest near the Atlantic coast of Brazil) as sister to the rest of the genus, based on morphological data (Coode, 2004).
Elaeocarpus represents a widespread lineage in Elaeocarpaceae that diverged 53 Mya, which was older than Phoon's estimate (40 Mya). Divergence time analysis suggests that Elaeocarpus split in the Eocene and migrated out of Australia to the surrounding regions mostly in the Oligocene and the Miocene, although the taxon sampling without species of Southeast Asian in this clade has led to these dates being doubted (Crayn, Rossetto & Maynard, 2006;Heibl & Renner, 2012;Phoon, 2015).
Divergence time analysis suggests that Sloanea diverged from its sister species V. stipularis and A. fruticosa at 0.4 Mya, more recent than Phoon's estimate (29 Mya). The reason may be that Sloanea in China belongs to a branch of the Slonaea genus, and the differentiation time is late.
Overall, the divergence times of all genera in Elaeocarpaceae inferred using the complete chloroplast genomes were more accurate than those inferred using DNA markers (trnL-trnF region and trnV-ndhC region).

CONCLUSIONS
In the present research, the chloroplast genomes of nine species were assembled using Illumina high-throughput sequencing data. The genomic structures of the 11 samples were compared and analyzed. On this basis, we concluded that the chloroplast genome structure and gene size in Elaeocarpaceae showed some difference. In addition, we determined the relationships of Elaeocarpaceae within Oxalidales and inferred divergence dates and the biogeographic history of the major clades within Oxalidales and Elaeocarpaceae. These results are consistent with previous studies reporting relationship in Elaeocarpaceae and provide new insight into the evolution of the Elaeocarpaceae.
Yifei Xie conceived and designed the experiments, performed the experiments, analyzed the data, authored or reviewed drafts of the article, and approved the final draft. Jiayi Jin performed the experiments, prepared figures and/or tables, and approved the final draft. Jinyue Li performed the experiments, analyzed the data, prepared figures and/or tables, and approved the final draft. Xiangdong Qiu performed the experiments, prepared figures and/or tables, and approved the final draft. Yang Tong performed the experiments, prepared figures and/or tables, and approved the final draft. Zhongyang Li conceived and designed the experiments, authored or reviewed drafts of the article, and approved the final draft. Zhixiang Zhang conceived and designed the experiments, authored or reviewed drafts of the article, and approved the final draft. Wenling Lai analyzed the data, authored or reviewed drafts of the article, and approved the final draft.

Supplemental Information
Supplemental information for this article can be found online at http://dx.doi.org/10.7717/ peerj.15322#supplemental-information.